Extend On-Device Sampling Support to more Causal Language Models #553

quic-sanising · 2025-09-04T20:18:50Z

📢 Expanded On-Device Sampling Support in QEfficient

Excited to share that On-Device Sampling—previously available only for LlamaForCausalLM—is now supported across a broader set of architectures! This enhancement brings faster, more efficient inference directly to the QAIC device.

✅ Newly Supported Architectures:

FalconForCausalLM
GemmaForCausalLM
GPT2LMHeadModel
GPTJForCausalLM
GraniteForCausalLM
GraniteMoeForCausalLM
LlamaForCausalLM (existing)
MptForCausalLM
Phi3ForCausalLM
Qwen2ForCausalLM

⚠️ Architectures Still Pending Support:

GPTBigCodeForCausalLM
InternVLChatModel
MistralForCausalLM
MixtralForCausalLM
LlamaSwiftKVForCausalLM
Grok1ModelForCausalLM

We’re actively working to extend support to these models. Contributions, feedback, and testing from the community are always welcome to help accelerate this effort!

Signed-off-by: quic-sanising <[email protected]>

Signed-off-by: sanising <[email protected]>

Signed-off-by: Dhiraj Kumar Sah <[email protected]>

Signed-off-by: sanising <[email protected]>

quic-sanising · 2025-09-04T20:25:50Z

Depends on PR #463.

quic-hemagnih · 2025-10-30T08:08:59Z

Have we tested these models for On-Device Sampling? Have you added test cases in the CI for these models?

quic-hemagnih · 2025-10-30T08:18:00Z

Also please rebase it, post your testing confirmation we can go ahead and merge this PR

quic-hemagnih

Please confirm -

have you tested the newly added models for On Device Sampling
Please rebase
Can you add few models in the CI

quic-sanising · 2025-10-30T11:04:46Z

Yes, I have tested it locally and the feature works for each of the 10 architectures mentioned above.
Rebase done.
Afaik for the CI, you guys prefer lightweight models. And maybe for each new architecture, we can add one model config that can be tested (similar to Tinyllama model configs for LlamaForCausalLM architecture in the current CI). If this sounds good, can you please provide me a list of models that you want to be tested? Also, we need to keep in mind that for each new model config, we would need to add 2 sets of ground truth: one for greedy sampling and one for random sampling. Let me know how you want to proceed.

quic-sanising and others added 30 commits June 18, 2025 13:38

Add sampler transform test

8417d8f

Signed-off-by: quic-sanising <[email protected]>

Merge branch 'main' into ods-unit-tests

27d8dd5

Add example script

067f9b5

Signed-off-by: sanising <[email protected]>

Update docs

931860f

Signed-off-by: sanising <[email protected]>

Enable On Device Sampling for _continuous_batching_execution()

79b6c95

Signed-off-by: sanising <[email protected]>

Disable On Device Sampling for _regular_model_execution()

75eac30

Signed-off-by: sanising <[email protected]>

Use same sampling parameters for each sequence in a batch

eb6e2eb

Signed-off-by: sanising <[email protected]>

Enable On Device Sampling for _regular_model_execution()

48b35e3

Signed-off-by: sanising <[email protected]>

Add test for greedy sampling

c83a631

Signed-off-by: sanising <[email protected]>

Add test for random sampling

f698a24

Signed-off-by: sanising <[email protected]>

Remove else block

7b34a07

Signed-off-by: sanising <[email protected]>

Merge branch 'main' into ods-unit-tests

5fa7269

Signed-off-by: sanising <[email protected]>

Reformat code

0ee201a

Signed-off-by: sanising <[email protected]>

Merge branch 'quic:main' into ods-unit-tests

c074768

Move sampling operations, inputs, and validation functions to utils

115505e

Signed-off-by: sanising <[email protected]>

Change model to TinyLlama

3ac7503

Signed-off-by: sanising <[email protected]>

Add header

02669e0

Signed-off-by: sanising <[email protected]>

Reformat code

137cc4a

Signed-off-by: sanising <[email protected]>

Merge branch 'quic:main' into ods-unit-tests

54a926a

Update linter

6acf446

Signed-off-by: sanising <[email protected]>

Merge branch 'quic:main' into ods-unit-tests

6083f5b

Remove device_id

c2d7e83

Signed-off-by: sanising <[email protected]>

Remove redundant line

1069109

Signed-off-by: sanising <[email protected]>

Merge branch 'quic:main' into ods-unit-tests

7d67132

Merge branch 'main' into ods-unit-tests

0e3f257

Remove redundant reinitialization of output buffers

908e67e

Signed-off-by: sanising <[email protected]>

Merge branch 'main' into ods-unit-tests

a8e55da

Add qaic_config to model hash

f3f89d3

Signed-off-by: sanising <[email protected]>

Merge branch 'main' into ods-unit-tests

81ae15a

Change config

c485bfd

Signed-off-by: sanising <[email protected]>

sanising and others added 6 commits August 25, 2025 17:37

Remove pretrained_model_name_or_path from qaic_config

0e3b383

Signed-off-by: sanising <[email protected]>

Revert changes to model hash

7d91470

Signed-off-by: sanising <[email protected]>

Added qaic_config to hash parameters via inclusion list.

e36add0

Signed-off-by: Dhiraj Kumar Sah <[email protected]>

Added qaic_config in manual hash tests for causal_lm dummy models.

127ec74

Signed-off-by: Dhiraj Kumar Sah <[email protected]>

Use different config for each test

dad96ca

Signed-off-by: sanising <[email protected]>

Add On Device Sampling support for more CausalLM models

af854e8

Signed-off-by: sanising <[email protected]>

quic-sanising changed the title ~~Extend On Device Sampling Support to more Causal Language Models~~ Extend On-Device Sampling Support to more Causal Language Models Sep 4, 2025

quic-sanising changed the base branch from main to ods-unit-tests September 4, 2025 20:26

quic-sanising changed the base branch from ods-unit-tests to main September 4, 2025 20:26

Merge branch 'main' into ods-extend

538c69f

quic-sanising marked this pull request as ready for review September 18, 2025 18:54

quic-sanising requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners September 18, 2025 18:54

quic-hemagnih requested changes Oct 30, 2025

View reviewed changes

Merge branch 'main' into ods-extend

af06d42

Merge branch 'main' into ods-extend

ae4cf51

quic-hemagnih approved these changes Oct 31, 2025

View reviewed changes

quic-hemagnih merged commit 35d8fd8 into quic:main Nov 1, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend On-Device Sampling Support to more Causal Language Models #553

Extend On-Device Sampling Support to more Causal Language Models #553

quic-sanising commented Sep 4, 2025 •

edited

Loading

Uh oh!

quic-sanising commented Sep 4, 2025

Uh oh!

quic-hemagnih commented Oct 30, 2025

Uh oh!

quic-hemagnih commented Oct 30, 2025

Uh oh!

quic-hemagnih left a comment

Uh oh!

quic-sanising commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Extend On-Device Sampling Support to more Causal Language Models #553

Extend On-Device Sampling Support to more Causal Language Models #553

Conversation

quic-sanising commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📢 Expanded On-Device Sampling Support in QEfficient

✅ Newly Supported Architectures:

⚠️ Architectures Still Pending Support:

Uh oh!

quic-sanising commented Sep 4, 2025

Uh oh!

quic-hemagnih commented Oct 30, 2025

Uh oh!

quic-hemagnih commented Oct 30, 2025

Uh oh!

quic-hemagnih left a comment

Choose a reason for hiding this comment

Uh oh!

quic-sanising commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

quic-sanising commented Sep 4, 2025 •

edited

Loading